PreviousNextTracker indexSee it online !

(88/314) 2064591 - XML plugin auto closes incorrect tag

jEdit (r13380), XML v2.0.9

With the attached file, when I try to close the '<head>' tag by typing '</', it closes the '<script>' tag on Line 7 instead. It's reading the '<script>' tags inside of the strings, which is incorrect.

Submitted hunteke - 2008-08-21 - 11:36:41z Assigned kerik-sf
Priority 5 Category None
Status Open Group None
Resolution None Visibility No

Comments

2008-09-02 - 15:10:57z
kpouer
Logged In: YES
user_id=285591
Originator: NO

Hi, that's nice, the problem is that you html document is not a valid xml document
You could put your javascript code in xml comments, I think it will still work, and will also fix the problem. I don't know if it is possible to fix that in the XML plugin, but wait for Alan's answer
2008-09-02 - 17:58:04z
hunteke
Logged In: YES
user_id=1271235
Originator: YES

Alright, I'll buy that it's not a valid XML document. However, I'll also point out that it's been useful to me in an HTML role as well. As, for instance, auto closing tags, showing me matching tags, and enabling folds.

To be fair, I'll point out that the engine at http://validator.w3.org/ has a similar problem.

I will attach the file in two parts, for playing purposes. A base correct HTML document (but not XML), and a udiff to make it not work.
File Added: html_compliant.html
2008-09-02 - 18:23:27z
hunteke
Logged In: YES
user_id=1271235
Originator: YES

$ patch -o html_borked.html < html_diff.diff

After applying this patch to the previous attachment (html_compliant.html), note that '</script>' tag on line 12 incorrectly matches a string on line 10.


File Added: html_diff.diff
2010-05-16 - 17:56:35z
ezust
I see everything works fine with the "html-compliant.html" test file, as long as I am parsing using the HTML parser.
However, if I add the code that is indicated in the diff, then it seems the contents of the <script> </script> confuses the XML parser and makes it think there are a bunch of opened script tags that need to be closed.

This might be a bug in the HTML sidekick parser part of the XML plugin.

2010-05-17 - 18:42:40z
daleanson
Actually, the html sidekick delegates a lot to the xml sidekick. The problem seems to be in the XML sidekick code, specificially in xml.parser.TagParser.findStartTag(). This method just works backwards through the code until it finds a matching start tag and takes no account of the sidekick Asset, which gives the actual start and end of a tag.

I'll attach an example xml file that demonstrates the problem. Open the file in jEdit, open Sidekick and be sure you're parsing it as xml. You'll see the same problem as with the html file.
2010-05-18 - 07:30:06z
hunteke
I apologize for forgetting to get back on this bug. I'm having a devil of a time tracking down the report now, or even a reference, but I have a recollection that my original assumption was incorrect. I recall pointing out this "error" to the W3 folks (or someone), and was informed that my formulation was actually incorrect.

XML parsers correctly ignore the semantics that we humans assign to quotes. Thus, to do this type of document morphing, one needs to encode the left chevron ('<'). See attached file for one possible solution. (For completeness, I'll point out that I believe the style of building HTML fragments piecemeal via string operations like I was apparently doing in 2008 is frowned upon.)

The point is that the XML or HTML plugin is actually performing the correct action.

Caveat emptor: as I haven't thought about this is over 18 months, I'm not sure that my recollection or above analysis is correct.
2010-05-18 - 07:39:27z
hunteke
Of course, *now* I look at your upload, Dale. Your example certainly /seems/ intuitive and explicit to me, but I'm afraid I don't know XML well enough to know for sure. I have a hunch however, that the XML snippet is in fact, not correct. My gut says you need to encode not just the left chevron, but the right one as well. Suggested differential attached.
2010-05-18 - 17:09:11z
kerik-sf
Dale: your exemple indeed doesn't parse as XML.
I don't agree about using the sidekick asset instead of TagParser, since the sidekick information is outdated most of the time, (even if you use parse on key-stroke, if there is an error before the point where you are editing). So I've found TagParser quite handy, since it uses only local context. This works most of the time, but not for <![CDATA[ some <tag> ]]> and not for the HTML script element.

Ignoring CDATA sections for now, as soon as you leave the <head> section, every element auto-closes correctly (OK, if you type "</" after </body>, you'll get back the </script>.
2010-05-18 - 17:30:42z
daleanson
Sorry to muddy the situation -- I wasn't saying the file I attached would parse correctly as XML since it won't. I was just trying to point out where the problem is. The problem is that the html sidekick delegates to the xml sidekick for tag matching. Often, html is not valid xml, which means the xml sidekick does not necessarily find the matching tag, as the original example and my example show.

Probably the right fix in this case is to have the html sidekick do its own tag matching.

Attachments

2008-09-02 - 17:58:04z
hunteke
html_compliant.html

HTML compliant, and correctly recognized by jEdit/XML plugin

2008-09-02 - 18:23:26z
hunteke
html_diff.diff

Diff against html_compliant.html to break XML plugin

2010-05-17 - 18:43:41z
daleanson
xml_sidekick_matching_example.xml

xml file that causes the same problem

2010-05-18 - 07:32:49z
hunteke
proper_method.html

Example of encoding the left chevron before use.

2010-05-18 - 07:40:23z
hunteke
diff_to_fix.diff

Encoding of chevrons to fix example XML file